Large Scale Machine Learning
Stochastic Gradient Descent
Use 1 example in each iteration
- Randomly shuffle(reorder) training examples
- Repeat{ for { (for every ) } }
Mini-batch Gradient Descent
Use examples in each iteration
Say .
Repeat{ for { (for every ) } }
Stochastic Gradient Descent Convergence
Checking for convergence
Batch gradient descent:
- Plot as a function of the number of iterations of gradient descent.
StochasQc gradient descent:
- During learning, compute before updating using .
- Every 1000 iterations (say), plot averaged over the last 1000 examples processed by algorithm.
Online Learning
e.g. 1
Shipping service website where user comes, specifies origin and destination, you offer to ship their package for some asking price, and users someQmes choose to use your shipping service , sometimes not . Features capture properties of user, of origin/destination and asking price. We want to learn to optimize price.
e.g. 2
Product search (learning to search)
User searches for “Android phone 1080p camera” Have 100 phones in store. Will return 10 results.
features of phone, how many words in user query match name of phone, how many words in query match description of phone, etc.
if user clicks on link. otherwise.
Learn .
Use to show user the 10 phones they’re most likely to click on.
e.g. ...
Other examples: Choosing special offers to show user; customized selection of news arQcles; product recommendation; ...
Map-reduce and data parallelism
Many learning algorithms can be expressed as computing sums of functions over the training set.